Team Bee (Beta Regressionists) (Advisor: Dr. Seals)
Regression analysis is a statistical tool used to explore relationships between variables.
Beta Regression: When the dependent variable is a ratio or percentage, constrained between 0 and 1.
Why not Linear Regression?
Beta distribution: Assumes the outcome follows a beta distribution, which is flexible for variables limited to (0, 1).
Precision Parameter (\phi): Allows control over the variance of the outcome, enabling flexibility for data with differing levels of dispersion.
The PDF of random variable with a beta distribution is as follows.
f(y) = \begin{cases} \frac{y^{\alpha-1}(1-y)^{\beta-1}}{B(\alpha,\beta)}, & 0 \le y \le 1 \\ 0, & \text{elsewhere} \end{cases} Where B(\alpha,\beta) = \int_0^1 y^{\alpha-1}(1-y)^{\beta-1} \ dy = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha+\beta)}.
\alpha and \beta are the shape variables where \alpha > 0 \quad \beta > 0. [1]
E[Y] = \mu = \frac{\alpha}{\alpha+\beta} \ \ \ \text{and} \ \ \ V[Y] = \sigma^2 = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} [1]
For beta regression, it is useful to introduce the following
\mu = \frac{\alpha}{\alpha+\beta} \\ \phi = \alpha + \beta \mu is the mean of the beta regression while the higher the \phi the less the variance or the less spread out the PDF function is. [2]
f(y; \mu, \phi) = \frac{\Gamma(\phi)}{\Gamma(\mu \phi) \Gamma((1 - \mu)\phi)} y^{\mu\phi - 1}(1 - y)^{(1 - \mu)\phi - 1},
\quad 0 < y < 1 Where:
\text{Var}(Y) = \frac{\mu(1 - \mu)}{1 + \phi} When \mu is near the extremes, 0 or 1, variance drops. [3]
Bias Correction/Reduction - Type of Estimator:
ML (Maximum Likelihood): Standard method, useful but may yield biased estimates in certain conditions.[4]
BC (Bias-Corrected): Adjusts estimates to correct for bias, providing more reliable parameter values.
BR (Bias-Reduced): Shrinks estimates towards a central value, which can improve predictive performance.
Beta Regression Trees
This extension uses recursive partitioning to model data that might exhibit subgroup-specific relationships.
It builds decision trees by splitting data into different subgroups based on the instability of model parameters across partitioning variables.
Model Approach:
Beta Regression was used to model suicide rates as a function of socio-economic factors, appropriate for data bounded between 0 and 1.
Dataset: Suicide Rates Overview 1985 to 2016, with variables like HDI, GDP per capita, sex, age group, and generation.
Cleaned data by removing outliers using Cook’s distance and leverage analysis.
Managed missing values and calculated descriptive statistics.
Model Details:
Incorporated interaction terms and adjusted precision (phi) to account for variance differences across groups.
Used beta regression trees to capture nonlinear relationships.
Evaluation:
Model performance assessed via pseudo R-squared.
Software: R for data management and analysis.
Model Rationale:
ReadingSkills dataset (N=44): Examines reading scores (0.0–1.0) for 44 children, including 19 with dyslexia and 25 without.
Beta regression models the response variable within the (0, 1) range, which suits the bounded reading scores better than normal regression.
Data transformation:
Extended Beta Regression - Bias Correction
pending
Beta Regression Trees
The beta regression tree shows HDI_year as a key predictor, with specific thresholds creating groupings where higher HDI_year values link to better outcomes and smaller nodes show more variability.
Model Diagnostics
The package betareg allows users to perform both fixed and variable dispersion beta regression. The model is based on the beta distribution, using a parameterization with the mean and precision[5].
(add plots)
(graph)
Regressors Dataset Tweaking
Remember dependent variable is in open interval (0, 1)
Beta Regression Fitting
betareg(
formula = accuracy ~ dcode * iq | dcode + iq,
data = ReadingSkillsModel,
type = "BC",
)
glm(
formula = accuracy ~ dcode * iq,
family = gaussian(link = "logit"),
data = ReadingSkillsModel,
)
Results for Normal Children
Results for Dyslexic Children
Pending
Table 2
| Table 2: Association of Reading Skills Score with IQ and presence of Dyslexia | ||||||
|---|---|---|---|---|---|---|
| Variable | Beta Regression | General Linear Regression | ||||
| β | SE | p | β | SE | p | |
| Dyslexia | -1.446 | 0.2954 | 9.767e-07 | -1.598 | 0.2448 | 8.565e-08 |
| IQ (Z-score) | 1.049 | 0.2718 | 0.0001132 | 0.4851 | 0.2916 | 0.104 |
| Dyslexia:iq | -1.144 | 0.2768 | 3.593e-05 | -0.5463 | 0.3145 | 0.09001 |
Dyslexia’s effect on scores
A child’s odds of answering a reading skills question correctly decreases by a factor of e^{1.446} if they are dyslexic assuming normal IQ.
IQ’s effect on scores
If a normal child’s IQ increases by 1 standard deviation, their odds of answering a reading skills question correctly increases by a factor of e^{1.049}
If a dyslexic child’s IQ increases by 1 standard deviation, their odds of answering a reading skills question correctly decreases by a factor of e^{0.095}
-0.095 = 1.049 - 1.144